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Abstract 



^ , New global statistical models of nuclidic (atomic) masses based on multilayered 

pH \ feedforward networks are developed. One goal of such studies is to determine how 

well the existing data, and only the data, determines the mapping from the proton 
and neutron numbers to the mass of the nuclear ground state. Another is to pro- 
r> \ vide reliable predictive models that can be used to forecast mass values away from 

d ' the valley of stability. Our study focuses mainly on the former goal and achieves 

substantial improvement over previous neural-network models of the mass table by 
using improved schemes for coding and training. The results suggest that with fur- 
ther development this approach may provide a valuable complement to conventional 
global models. 
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1 Introduction 



The problem of devising global models of nuclidic (atomic) masses has a long 
history, going back to the early work of Bohr, von Weizsacker and Bethe based 
on the liquid drop model (sec refs. [1,2,3,4,5,6] for reviews). The principal ob- 
jectives are (i) a fundamental understanding of the physics of the mass surface 
and (ii) the prediction of the masses of "new" nuclides far from stability, both 
in the superheavy region and in the regions approaching the proton and neu- 
tron drip lines. 

The actual predictions for the masses are of great current interest in connec- 
tion with present and future experimental studies of nuclei far from stability, 
conducted at heavy- ion and radioactive ion-beam facihties [7] . The results are 
also useful for such astrophysical problems as nucleosynthesis and supernova 
explosions. The spectrum of models of the atomic mass table ranges from 
those with high theoretical input that take explicit account of known physi- 
cal principles in terms of a relatively small number of fitting parameters, to 
models that are shaped mostly by the data and very little by theory and 
thus have a correspondingly larger number of adjustable parameters. Epit- 
omizing models of the former class are the macroscopic/microscopic models 
of Moller, Nix, and coworkers [1,8,9,10], and the semi-microscopic models of 
Pearson, Tondeur, and coworkers [11,12,13]. The models of Moller et al. ap- 
peal to the macroscopic descriptions provided by the liquid drop and droplet 
models, solve a one-body Schrodinger equation to incorporate single-particle 
degrees of freedom, and include pairing through semi-microscopic calculation. 
A prominent version, which sets the standard for state-of-the-art theory-based 
models, is the finite-range droplet model (FRDM) detailed in ref. [10]. The 
models of Pearson, Tondeur, and coworkers are based on the Hartree-Fock 
method, with pairing correlations described by either a BCS or Bogolyubov 
treatment. The current version, namely the HFB2 model [13], features the 
Bogoliubov approach and an improved Skyrme force. 

In this work we use neural networks to develop global nuclear mass models 
which are situated far toward the other end of the spectrum, where one (in 
the ideal) seeks to determine the degree to which the entire mass table is 
determined by the existing experimental data, and only the data. During the 
last decade, artificial neural networks have been utilized to construct predictive 
statistical models in a variety of scientific problems ranging from astronomy 
to experimental high-energy physics to protein structure [14,15]. In a typical 
application, a multilayer feedforward neural network is trained with back- 
propagation or some other supervised training algorithms [16,17,18,19] so as 
to create a "predictive" statistical model of a certain input-output mapping, 
which may in general be physical or mathematical in character. Information 
contained in a set of learning examples of the input-output association is 
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embedded in the weights of the connections between the layered units. This 
information may (or may not) be sufficient to aUow the trained network to 
make rehable predictions for examples outside the learning set. At any rate, 
the network is taught to generalize (well or poorly), based on what it has 
learned from the set of examples. In the more mundane language of function 
approximation, the neural-network model provides a means for interpolation 
or extrapolation. 

Nuclear physics offers especially rich territory for "data mining" with neural 
nets. On the one hand, a huge collection of high-quality experimental data 
is available for diverse properties of more than 2000 nuclides. On the other, 
quantitative calculation of some properties of some classes of nuclei presents 
difficult challenges even for the best ab initio quantum-mechanical theories and 
phenomenological macroscopic/microscopic models. To date, global neural- 
network models have been developed for the stabihty/instabihty dichotomy, 
for the atomic-mass table, for neutron separation energies, for spins and par- 
ities, for decay branching probabilities of nuclear ground states, and for /? 
decay half-lives [20,21,22,23,24,25,26,27,28,29,30]. 

In the work to be described here, neural nets have been trained to predict 
the nuclear mass excess or "defect" AM, continuing the program established 
in refs. [20,21,23,28,29]. In Sec. 2, we outline the training methodology that 
has been applied and specify the data sets used in the modeling process. The 
results are presented and discussed in Sec. 3. Finally, Sec. 4 states the general 
conclusions of the current study and views the prospects for further successes 
and further improvements in statistical prediction of atomic masses. 



2 Design and training of neural-network models 

Our immediate tasks are to specify (i) the structure and unit-dynamics of the 
networks that will be developed to model the mass data and (ii) the algorithm 
for their training. We must also specify (iii) the data sets to be utilized together 
with (iv) the schemes for encoding and decoding input and output data. 

2.1 Architecture and Dynamics 

A multilayer feedforward architecture is adopted, with various numbers of 

hidden layers and distributions of units among layers. The gross architecture of 
a given net is summarized in the notation {I-Hi-H2-...-Hl-0)[P], where P is 
the total number of weight/bias parameters and /, H^, and O are integers that 
indicate, respectively, the numbers of neuron-like units in the input layer, the 



3 



ith intermediate (or "hidden") layer, and the outpTit layer. Unless otherwise 
indicated, each unit in a layer is connected to all units to the next layer. The 
connection from unit m to unit n is characterized by a real-number weight 
with initial value positioned at random in the range [—1, 1]. 

When a pattern /x is impressed on the input interface, the activities of the input 
units are set in accordance to the coding scheme assumed (see section 2.4.). 
Each unit in a hidden layer or in the output layer receives a stimulus Un = 
T.m'^mnO'm-i where the are the activities of the units in the immediately 
preceding layer. The activity of generic unit m in the hidden or output layers 
is in general a nonlinear function of its stimulus, = g{urn)- In our work, 
the activation function g{u) is taken to have the logistic form, g{u) = [1 + 
exp(— m)]~^. The system response may be decoded from the activities of the 
units of the output layer also in accordance to the coding scheme assumed. The 
dynamics is particularly simple: the states of all units within a given layer are 
updated in parallel and the layers are updated successively, proceeding from 
input to output. 



2.2 Training Algorithms 



Several training algorithms exist that seek to minimize the cost function with 
respect to the network weights. For the cost function we make the traditional 
choice of the sum of squared errors calculated over the learning set, or more 
specifically 

E^T.E^'^-\i:i^f-of)\ (1) 



where ti and denote, respectively, the target and actual activities of unit 
i of the output layer for input pattern (or example) /x. In global modeling 
of the atomic mass table, the root-mean-square error a-cms has been widely 
adopted as the key figure of merit, and we shall use its value, calculated 
over various data sets, to assess the performance of neural-network models. 
When evaluated over the learning set, this quantity evidently coincides with 
(2E /NpY^"^ , where Np is the number of training examples. 

The most familiar training algorithm is standard back-propagation [16,18] 
(hereafter often denoted SB), according to which the weight update rule to be 
implemented upon presentation of pattern fj, is 

= + , (2) 
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where rj is the learning rate (0 < < 1), a is the momentum parameter 
{0 < a < 1). and ^ — 1 is the pattern impressed on the input interface one 
training step earher. The second term on the right-hand side of Eq. (2), called 
the momentum term, serves to damp out the wild oscillations in weight space 
that might otherwise occur during the gradient-descent minimization process 
that underlies the back-propagation algorithm. In our implementation of the 
SB algorithm, the learning rate and momentum parameters remain constant 
at 0.5 and 0.9 respectively. 

Our artificial neural networks are trained with a modified version of the SB 
algorithm [26,28,29] that we have found empirically to be advantageous in the 
mass-modeling problem. In this new algorithm, to be denoted MB, the weight 
update prescription corresponding to Eq. (2) reads 

^^l^l = + ^sL'^ ' (3) 



the momentum term being modified through the quantity 

C'(/i-2) I A.,,(m-1) 

~ e + 1 ■ ^ ' 



In the latter expression, e is the number of the current epoch, with e = 
0, 1, 2, 3, . . .. An epoch consists of Np pattern presentations, the patterns be- 
ing chosen at random from the set of Np training examples. Many epochs 
of training are required to achieve acceptable performance in the problem at 
hand. 

The variable S^~^^ entering Eq. (3) is initialized as follows: For the first 
pattern to be presented in epoch e = 0, it is set equal to zero. At the beginning 
of each new epoch (e > 0), it is taken equal to the value reached by Smn at 
the end of the immediately preceding epoch. The replacement of Aw^~^^ by 
^rnn^^ in the update rule for the generic weight Wmn allows earlier patterns of 
the current epoch to have more influence on the training than is the case for 
standard back-propagation. By the time e becomes large, S^~^^ is effectively 
zero. It can be shown, after rather lengthy algebra, that if a plateau region 
of the cost surface has been reached (i.e. dE/dwmn remains almost constant) 
and e is relatively large, then Eq. (3) converges to 

I dE 

AWmn = ^ , 5 

I- a OWmn 



thus achieving an effective learning rate twice that of the SB algorithm (cf. [16]). 
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Fig. 1. Time course of the learning error {(Jrms) for a network trained with the modi- 
fied backpropagation (MB) algorithm defined by eqs. (3) and (4) (typical example). 

In training neural networks, there is always the question of when to stop the 
process. If the network is trained for too short a period, the training data will 
not be adequately fitted. On the other hand, if the training period is continued 
for too long, generalization (i.e. prediction) will suffer, since the "overtrained" 
network will be specialized to the idiosyncrasies of the particular learning set 
that has been supplied. Thus some reasonable compromise must be struck 
between the desiderata of a good fit and good prediction. In our computer 
experiments, we have adopted the following criterion. A given training run 
consists of a relatively large number of epochs, specified beforehand. During 
such a run, we monitor not only the cost function for the patterns in the 
learning set, but also for a separate validation set of nuclei whose masses are 
known. The "trained" network model resulting from a given run is taken as 
that network with the set of connection weights producing the smallest value 
of the cost function on the validation set, over the full course of the run. While 
the members of the validation set are not used in the weight updates of the 
MB (or SB) training rule, they clearly do affect the choice of model. There- 
fore, accuracy on the validation set cannot strictly be regarded as a measure 
of predictive performance, although in practice it may still provide a useful 
indicator of this aspect of the model. To obtain a clean measure of predictive 
performance, still a third set of examples is needed: a test (prediction) set that 
is never referred to during the training process. 

Another general problem that one faces in training neural networks is that of 
the optimal architecture in terms of the numbers of layers and the numbers of 
units in each layer. As in most neural-network applications, we have simply 
followed a "trial-and-error" approach to this problem; certainly no claim can 
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be made that a full optimization has been achieved. 

Numerical experiments have shown that performance can be improved by 
modifying the learning rate 77 and momentum parameter a during training. In 
applying the MB algorithm, 77 and a are assigned starting values 0.5 and 0.9, 
respectively, and the validation error is calculated every five epochs. If this 
error decreases for two or more consecutive evaluations, the learning rate 77 
(with < ?7 < 3) is increased by 0.02; otherwise it is decreased by 0.005. The 
momentum parameter a is usually set to 0.9999 when e becomes relatively 
large. Comparative studies of the mass-modeling problem (to be summarized 
in Table 1 of Section 3) demonstrate that this training procedure, in conjunc- 
tion with the MB update rule (3)-(4), generally yields better results than does 
the SB algorithm. 

The evolution of the learning error under application of the MB algorithm is 

illustrated by the sample shown in fig. 1. Wc emphasize that this algorithm 
departs from gradient descent, allowing the network to escape from local min- 
ima. 



2.3 Data Sets 



In exploring the prospects for statistical modeling of nuclear mass excesses, 
we have primarily employed a database 0-l-N made up of (i) 1323 "old" (O) 
experimental masses which the 1981 MoUer-Nix theoretical model [8] was de- 
signed to reproduce, together with (ii) 351 "new" (N) experimental masses, 
measured subsequently for nuclei that lie mostly beyond the edges of the 1981 
data collection when viewed in the N — Z plane. As discussed in ref. [1], the O 
and N data sets were selected as part of a strategy for quantifying the extrapo- 
lation capabihty ( "extrapabihty" ) of global mass models - i.e., their abihty to 
predict the atomic masses for nuclides far from stability. These sets have also 
been used in evaluating the predictive performance of neural network models 
of the atomic mass function, with set O providing the fitting data (learning 
set) and set N the target data for prediction (test set) (e.g., see ref. [21]). 

To further characterize the interpolation/extrapolation capability of our mod- 
els, we have also employed two data sets of 1303 (Ml) and 351 (M2) nuclei 
and their masses, chosen randomly from the union of the O and N sets, after 
excluding 20 nuclides with poorly measured masses. Together, these 1654 cases 
form the database fitted by the FRDM parametrization of ref. [10], i.e., by the 
best of the mid-90's theoretical models developed by the Los Alamos-Berkeley 
Group. We also make use of another set of 158 nuclei (denoted NB) that lie 
outside the O and N databases, the experimental masses of these examples 
being drawn from the NUBASE evaluation of nuclear and decay properties 
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Fig. 2. Locations in the N — Z plane are indicated for the O, N, and, NB data sets 
employed in neural-network modeling of nuclear mass excesses (see section 2.3). 

[3]. The locations of the nuclei of the O, N, and, NB sets in the N — Z plane 
are shown in fig. 2. 



2.4 Coding at Input and Output Interfaces 

We have considered several input coding schemes designed to facihtate learning 
of quantal properties (pairing, shell structure) that depend on the integral 
nature of Z and (see refs. [20,21,26]). The scheme that achieves this aim 
most efficiently while keeping the number of weights to a minimum is one that 
implements analog (floating-point) coding of Z and N in terms of the inputs 
of only two dedicated analog input neurons, which, however, are aided by two 
further binary ( "on-off" ) input units that encode the parity (even or odd) of 
Z and [26] . The analog input units scale the Z and values to the interval 
[0,1] in such a manner that the stimuli received by the logistic units in hidden 
and output layers remain within their best dynamical range. A less efficient 
scheme, introduced in ref. [20], utilizes banks of on-off units to represent Z 
and N as binary integers. 

The mass excess computed by the network is represented by the activity of 
a single analog output unit. For the same reason as for the input units rep- 
resenting Z and N, the target mass excess values AM are also scaled to the 
interval [0,1]. 

Several prescriptions have been tried for scaling the Z, N, and AM variables, 
two of which are represented in the results reported here. Extensive compar- 
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ative studies of the MB and SB training algorithms were based on the PI 
prescription, which admits the ranges [0, 110], [0, 160], and [—110, 130] for the 
variables Z, N, and AM, respectively. In later work we adopted the P2 pre- 
scription, which allows for the extended respective ranges [0, 130], [0, 200], and 
[—110, 250], thereby providing ample room for new nuclei far from the stable 
valley. The latter scaling recipe usually gives better results. 



3 Results 

We first present results of a comparative study of the quality of models gen- 
erated with the modified back-propagation training algorithm MB and with 
the standard back-propagation routine SB. Seven pairs of models were con- 
structed, one member of each pair being trained with MB and the other with 
SB, and both members of the pair being started from the same choice among 
seven different sets of random initial weights. All of these models have archi- 
tecture (4-10-10-10-1) [281] and employ the PI scaling recipe. In all seven 
cases, the values of the error measures (Trms attained by the MB algorithm for 
the learning, validation, and test sets are consistently smaller than the corre- 
sponding values achieved with SB. A similar pattern is expected to hold for 
the P2 scaling prescription. 

Specializing to the MB algorithm, we have carried out a substantial num- 
ber of computer experiments for networks with various architectures and for 
networks with the same architecture but different random choices of initial 
weights. 

Performance measures of some of our best models (marked with asterisks) are 
reported in Table 2, together with similar performance data for two neural- 
network models from earlier studies and for three high-quality theoretical mod- 
els, two from MoUer et al. [1,10] and one due to Pearson et al. [13]. The table 
entries are separated into two groups by a pair of horizontal lines. 

The generalization abihty (extrapability) of models belonging to the first 
group, whether statistical or theoretical, was assessed by treating the N data 
set as a test set. In this way, results obtained by the neural-network approach 
could be directly compared with the results [1] of the extrapability study 
carried out by the Los Alamos-Berkeley group for the FRDM approach (and 
especially with the rms error values shown in the first row of Table 2). The first 
of the network models selected from earlier studies is of the five-layer archi- 
tecture (18-10-10-10-1) [421]; it was constructed by Gernoth et al. [21] using 
standard back-propagation, with binary encoding of Z and N and "redun- 
dant" analog encoding of the atomic mass number A and the neutron excess 
N — Z. The three-layer network (4-40-1) [245] is due to Kalman, who adopted 
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Table 1 

Performance comparison of standard back-propagation (SB) and modified back- 
propagation (MB) training algorithms in the task of mass modeling. Results for 
the rms error <7rms on learning, validation, and test sets are given for seven pairs 
of models with network architecture (4-10-10-10-1) [281]. The members of each 
model pair belong to the same choice among seven initial sets of random weights. 
All models have been trained for a total of 20,000 epochs using the PI scaling recipe 
defin ed in Section 2.4. 



Set 


Algorithm 


learning (0) set 
a,^,(MeV) 


validation (N) set 
ar^,(MeV) 


test (NB) set 
ar„,,(MeV) 


1 


MB 


0.78 


1.92 


2.84 




SB 


1.27 


2.57 


3.12 


2 


MB 


0.78 


1.93 


2.34 




SB 


1.11 


3.73 


4.53 


3 


MB 


0.71 


1.25 


1.88 




bB 


n no 


i.OO 


Z.44 


4 


MB 


0.98 


1.81 


3.10 




SB 


1.31 


3.88 


4.36 


5 


MB 


0.69 


1.54 


2.14 




SB 


1.32 


2.06 


2.79 


6 


MB 


0.66 


1.65 


2.73 




SB 


0.96 


1.81 


3.03 


7 


MB 


0.62 


1.59 


2.01 




SB 


0.98 


3.71 


3.97 



analog coding of Z and A^" and auxiliary parity units for these variables. In 
training this model, the input patterns were pre-processed by singular-value 
decomposition and the cost function minimized by a Powell-update conjugate- 
gradient algorithm (for additional details, see refs. [23,30]). 

The network model labeled with one asterisk was one of those created mainly 
for evaluating the modified training procedure MB. It has the five-layer archi- 
tecture (4-10-10-10-1) [281] and employs parity-aided input coding and the 
scaling recipe PI. Utilization of the NB data for validation gives this model an 
a 'priori advantage over the earlier network models, for which only the learning 
set was involved in the training process. This advantage is clearly realized in 
practice. 

Also included in the first group is a set of results [31] obtained recently with 
a Support Vector Machine [32,33]. 
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Table 2 

Performance measures of global models of the atomic mass table derived from 
neural-network technology and from conventional theory and phenomenology (see 
text for details). Data sets are indicated in parentheses. Those rms error measures 
referring to actual predictions, rather than fits, are printed in italic font. 



Network architecture 
{I-Hi-H2-...-Hl-0)[P] 


learning set 


validation set 
crrms(MeV) 


test set 

(7rms(MeV) 


Moller et al. (ref. [1]) 


0.67 (0) 




0.74 (N) 


(18-10-10-10-1) [421J (rcf. [21J) 
Z &: iV m binary 
A Z — N in. analog 


0.83 (0) 


- 


5.98 (N) 


(4-40-1) [245] (ref. [30]) 


1.07 (0) 




3.04 (N) 


(4-10-10-10-1)* [281] 
Z k N in analog and parity 


0.71 (0) 


2.28 (NB) 


2.16 (N) 


Support Vector Machine (ref. [31]) 


0.70 (0) 




0.75 (N) 


Moller et al. (ref. [10]) 


0.68 (Ml) 


0.71 (M2) 


0.70 (NB) 


Pearson et al. (ref. [13]) 


0.67 (Ml) 


0.68 (M2) 


0.73 (NB) 


(4-10-10-10-1)** [281] 
Z &z N in analog and parity 


0.41 (Ml) 


0.47 (M2) 


1.48 (NB) 


(4-10-10-10-1)*** [361] 
Z &z N in analog and parity 


0.44 (Ml) 


0.44 (M2) 


0.95 (NB) 



The network models appearing in the second group employ the scaling recipe 
P2 and were trained with the mixed data sets Ml and M2 described in section 
2.3. Thus, the training and validation examples include, in this case, members 
from both the O and N data sets. The intent was to develop statistical models 
that can be compared more directly with the most refined FRDM model of 
ref. [10], recaUing that the parameters of this model were fit to the 1654 exam- 
ples of the M1-I-M2 database. The network model marked with two asterisks 
has the same five-layer architecture and number of weight parameters as the 
(*) network. In the network model marked with three asterisks, we chose to 
introduce connections from the analog input units to all units of all the hid- 
den layers, to avoid or reduce the degradation of information as it propagates 
toward the output unit. However, this innovation comes at the expense of in- 
creasing the number of weight parameters from the 281 of the (**) network to 
361. The resulting system is the best neural-network model of the mass table 
yet achieved, based on the accuracy of the estimated masses of the NB set 
of nuchdes, taken as the test set. The corresponding rms error is 0.95 MeV, 
which is to be compared with the figure 0.70 MeV obtained in the FRDM 
evaluation. Also included in the third group are the corresponding rms errors 
for the HFB2 model [13]. The parameters of this model have been adjusted to 
an extended data set of 1888 nuclei, which however includes the 158 nuclei of 
the NB set. 
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Fig. 3. Top panel: Deviations from experiment (in MeV) of mass-excesses values 
predicted by the neural-network model (4-10-10-10-1)*** for the NUBASE (NB) 
nuclei identified in fig. 2. The plot represents a projection of the mass surface onto 
a plane of constant Z and thus shows dependence on neutron number N. Bottom 
panel: Same for the FRDM evaluation [10]. 

Further information on the performance of the (***) network is furnished in 
fig. 3. Here we compare the deviations from experimental data of the mass- 
excess values generated by the net and by the FRDM evaluation, for the 
NB nuclei. The extrapolation capability of the (***) network model is better 
illustrated in fig. 4, which shows these deviations as a function of the number 
of neutrons away from the /3~ stability line. 

In spite of its residual shortcomings, the current generation of neural-network 
models of the mass table represents a significant step toward extrapability lev- 
els comparable with those reached by the best traditional global models rooted 
in quantum theory. The ultimate test of any class of global mass models is the 
accuracy that can be realized in the prediction of masses of nuclear species 
prior to measurement. A text data file containing the mass-excess values pre- 
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Fig. 4. Top panel: Deviations from experiment (in MeV) of mass-excesses values 
predicted by the neural-network model (4-10-10-10-1)*** for the NUBASE (NB) 
nuclei identified in fig. 2, as a function of the number of neutrons away from the 
line of (3~ stability. Bottom panel: Same for the FRDM evaluation [10]. 

dieted by the (***) network for 7709 nucUdes {Z,N >8,Z < 120, N < 200) is 
available for downloading from http:/ / www, cc. uoa.gr sathanas/mass-excess^ 
(see file "massfiles.txt" for details). 



4 Conclusions and prospects 



The present investigation is a continuation and an elaboration of a research 
thrust [20,21,22,23,24,25] that seeks to develop accurate global models of nu- 
clear properties with demonstrable predictive power, within the arena of sta- 
tistical methods based on multilayer neural networks. Our particular concern 
has been the central problem of modeling the systematics of nuclear mass ex- 
cesses. We have introduced a modified back-propagation algorithm (MB) along 
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with new prescriptions for encoding and decoding input and output patterns. 
The study has been based mainly on data sets selected by Moller and Nix [1] 
for the purpose of testing the extrapolation capabilities of global models of 
atomic masses. As seen in Table 2, our best network models of the nuclear 
mass excess AM display substantially improved performance relative to ear- 
lier attempts that use neural networks to predict masses far from the valley of 
(3 stability. A strong impetus for further improvement of this approach comes 
from the production of new nuclei at radioactive beam facilities and heavy-ion 
colliders, as well as by the needs of supernova modehng and state-of-the-art 
theories of nucleosynthesis. 

In closing, we would like to emphasize the conceptual and structural differences 
between 

(i) statistical models of the atomic-mass function constructed with the aid of 
learning rules operating purely on the experimental data, without any overt 
imposition of physical principles and theory, and 

(ii) the familiar theory-based phenomenological models, constrained by the data. 

Although the latter models become more elaborate as the standards of de- 
scription increase, they are nevertheless relatively compact, having relatively 
few adjustable parameters, with transparent physical meaning. By contrast, 
the neural- network methodology is a more abstract kind of "engine" that 
generates a statistical representation of the experimental data having many 
parameters. While this representation may have strong predictive power, its 
parameters arc ordinarily (though not always) opaque to physical interpre- 
tation. In view of these fundamental differences, the two approaches should 
more fruitfully be viewed as complementary, rather than in competition. 

We are currently exploring and implementing a number of refinements of 
neural-network approaches to the mass problem. These include the introduc- 
tion of diverse pruning and network construction schemes and the application 
of other more powerful training (optimization) procedures. Additionally, we 
have made some initial attempts to construct an informative statistical model 
of the differences between the experimental mass-excess values AM^^^ and the 
theoretical values AM^^ given by the FRDM model of Moller et al. [10]. This 
study is being pursued with the hope of revealing subtle regularities of nuclear 
structure not yet embodied in the best microscopic/phenomenological models 
of atomic-mass systematics. To date, the results have not been illuminating 
in terms of the emergence of systematic trends - a tentative finding which, if 
sustained, could imply that the residual physical corrections to the theoretical 
model are small but numerous, and of fluctuating size and sign. Also under in- 
vestigation is the potential of Support Vector Machines [32,33] for systematic 
development of near-optimal statistical models of atomic masses and other 
nuclear properties. The results reported in table 2 are suggestive of the power 
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of this approach. 



5 Addendum 

After completing the training of the (***) network, the AME03 atomic mass 
evaluation [34] was published. This compilation made available precision mass 
measurements for nuclei farther off the stability line, while providing corrected 
mass-excess values for nuclei already used in our study. The next generation of 
neural-network models will be trained using the AME03 data. Already, how- 
ever, we can further appraise the extrapability performance of the (***) net- 
work, the best neural-network model of the mass table yet achieved, by making 
use of 529 new nuclei included in the AME03 evaluation, which extend beyond 
the edges of the 1654-nuclide set M1-I-M2 as viewed in the N — Z plane. A text 
data file containing the predicted mass-excess values for these 529 additional 
nuclides is available for downloading from \hitp://www. cc. uoa.gr/r^sathanas/ 
mass_excess (see file "massfiles.txt" for details). The resulting value of a^ms for 
these nuclei is 1.03 MeV, which is to be compared with the figures 0.58 MeV 
and 0.67 MeV obtained in the FRDM and HFB2 evaluations. When compar- 
ing these results, it should be kept in mind that the parameters of the HBF2 
model have been adjusted by making use of an extended data set of 1888 
nuclei, which includes 255 of the 529 nuclides. 
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